Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 2988181 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 296.4 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 11 |
|---|---|
| Categorical | 2 |
session_start has a high cardinality: 646874 distinct values | High cardinality |
click_timestamp has a high cardinality: 2983198 distinct values | High cardinality |
Unnamed: 0 is highly correlated with session_id | High correlation |
session_id is highly correlated with Unnamed: 0 | High correlation |
Unnamed: 0 is highly correlated with session_id | High correlation |
session_id is highly correlated with Unnamed: 0 | High correlation |
click_deviceGroup is highly correlated with click_os | High correlation |
click_os is highly correlated with click_deviceGroup | High correlation |
Unnamed: 0 is highly correlated with session_id | High correlation |
session_id is highly correlated with Unnamed: 0 | High correlation |
Unnamed: 0 is highly correlated with user_id and 1 other fields | High correlation |
user_id is highly correlated with Unnamed: 0 and 1 other fields | High correlation |
session_id is highly correlated with Unnamed: 0 and 1 other fields | High correlation |
click_deviceGroup is highly correlated with click_os | High correlation |
click_os is highly correlated with click_deviceGroup | High correlation |
click_country is highly correlated with click_region | High correlation |
click_region is highly correlated with click_country | High correlation |
Unnamed: 0 is uniformly distributed | Uniform |
click_timestamp is uniformly distributed | Uniform |
Unnamed: 0 has unique values | Unique |
Reproduction
| Analysis started | 2022-08-28 12:12:50.057321 |
|---|---|
| Analysis finished | 2022-08-28 12:17:09.333827 |
| Duration | 4 minutes and 19.28 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
Unnamed: 0
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIFORMUNIQUE| Distinct | 2988181 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1494090 |
| Minimum | 0 |
|---|---|
| Maximum | 2988180 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 149409 |
| Q1 | 747045 |
| median | 1494090 |
| Q3 | 2241135 |
| 95-th percentile | 2838771 |
| Maximum | 2988180 |
| Range | 2988180 |
| Interquartile range (IQR) | 1494090 |
Descriptive statistics
| Standard deviation | 862613.6967 |
|---|---|
| Coefficient of variation (CV) | 0.577350559 |
| Kurtosis | -1.2 |
| Mean | 1494090 |
| Median Absolute Deviation (MAD) | 747045 |
| Skewness | -1.014664281 × 10-15 |
| Sum | 4.46461135 × 1012 |
| Variance | 7.441023897 × 1011 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1992114 | 1 | < 0.1% |
| 1992116 | 1 | < 0.1% |
| 1992117 | 1 | < 0.1% |
| 1992118 | 1 | < 0.1% |
| 1992119 | 1 | < 0.1% |
| 1992120 | 1 | < 0.1% |
| 1992121 | 1 | < 0.1% |
| 1992122 | 1 | < 0.1% |
| 1992123 | 1 | < 0.1% |
| Other values (2988171) | 2988171 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 2988180 | 1 | |
| 2988179 | 1 | |
| 2988178 | 1 | |
| 2988177 | 1 | |
| 2988176 | 1 | |
| 2988175 | 1 | |
| 2988174 | 1 | |
| 2988173 | 1 | |
| 2988172 | 1 | |
| 2988171 | 1 |
| Distinct | 322897 |
|---|---|
| Distinct (%) | 10.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 107947.8258 |
| Minimum | 0 |
|---|---|
| Maximum | 322896 |
| Zeros | 8 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6370 |
| Q1 | 40341 |
| median | 86229 |
| Q3 | 163261 |
| 95-th percentile | 274162 |
| Maximum | 322896 |
| Range | 322896 |
| Interquartile range (IQR) | 122920 |
Descriptive statistics
| Standard deviation | 83648.36147 |
|---|---|
| Coefficient of variation (CV) | 0.7748962136 |
| Kurtosis | -0.4686650537 |
| Mean | 107947.8258 |
| Median Absolute Deviation (MAD) | 57248 |
| Skewness | 0.7231189115 |
| Sum | 3.22567642 × 1011 |
| Variance | 6997048377 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5890 | 1232 | < 0.1% |
| 73574 | 939 | < 0.1% |
| 15867 | 900 | < 0.1% |
| 80350 | 783 | < 0.1% |
| 15275 | 746 | < 0.1% |
| 2151 | 722 | < 0.1% |
| 4568 | 529 | < 0.1% |
| 12897 | 513 | < 0.1% |
| 11521 | 502 | < 0.1% |
| 34541 | 501 | < 0.1% |
| Other values (322887) | 2980814 |
| Value | Count | Frequency (%) |
| 0 | 8 | < 0.1% |
| 1 | 12 | < 0.1% |
| 2 | 4 | < 0.1% |
| 3 | 17 | < 0.1% |
| 4 | 7 | < 0.1% |
| 5 | 87 | |
| 6 | 35 | |
| 7 | 22 | < 0.1% |
| 8 | 56 | |
| 9 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 322896 | 2 | |
| 322895 | 2 | |
| 322894 | 2 | |
| 322893 | 2 | |
| 322892 | 2 | |
| 322891 | 2 | |
| 322890 | 2 | |
| 322889 | 2 | |
| 322888 | 2 | |
| 322887 | 3 |
| Distinct | 1048594 |
|---|---|
| Distinct (%) | 35.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.507472228 × 1015 |
| Minimum | 1.506825423 × 1015 |
|---|---|
| Maximum | 1.508211379 × 1015 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 1.506825423 × 1015 |
|---|---|
| 5-th percentile | 1.506941766 × 1015 |
| Q1 | 1.507124152 × 1015 |
| median | 1.50749334 × 1015 |
| Q3 | 1.507749414 × 1015 |
| 95-th percentile | 1.508153221 × 1015 |
| Maximum | 1.508211379 × 1015 |
| Range | 1.385955918 × 1012 |
| Interquartile range (IQR) | 6.252618534 × 1011 |
Descriptive statistics
| Standard deviation | 3.855244602 × 1011 |
|---|---|
| Coefficient of variation (CV) | 0.0002557423301 |
| Kurtosis | -1.111389169 |
| Mean | 1.507472228 × 1015 |
| Median Absolute Deviation (MAD) | 3.329949664 × 1011 |
| Skewness | 0.1807598817 |
| Sum | 3.594316782 × 1018 |
| Variance | 1.486291094 × 1023 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.507563658 × 1015 | 124 | < 0.1% |
| 1.507896573 × 1015 | 107 | < 0.1% |
| 1.507133568 × 1015 | 106 | < 0.1% |
| 1.507309773 × 1015 | 98 | < 0.1% |
| 1.508112331 × 1015 | 94 | < 0.1% |
| 1.507647366 × 1015 | 92 | < 0.1% |
| 1.507475404 × 1015 | 86 | < 0.1% |
| 1.506959499 × 1015 | 82 | < 0.1% |
| 1.508154737 × 1015 | 79 | < 0.1% |
| 1.506999909 × 1015 | 75 | < 0.1% |
| Other values (1048584) | 2987238 |
| Value | Count | Frequency (%) |
| 1.506825423 × 1015 | 2 | |
| 1.506825426 × 1015 | 2 | |
| 1.506825435 × 1015 | 2 | |
| 1.506825443 × 1015 | 2 | |
| 1.506825528 × 1015 | 2 | |
| 1.506825541 × 1015 | 3 | |
| 1.506825553 × 1015 | 2 | |
| 1.506825568 × 1015 | 2 | |
| 1.506825573 × 1015 | 3 | |
| 1.506825599 × 1015 | 2 |
| Value | Count | Frequency (%) |
| 1.508211379 × 1015 | 2 | < 0.1% |
| 1.508211376 × 1015 | 2 | < 0.1% |
| 1.508211372 × 1015 | 2 | < 0.1% |
| 1.508211369 × 1015 | 7 | |
| 1.508211367 × 1015 | 2 | < 0.1% |
| 1.508211353 × 1015 | 4 | |
| 1.508211348 × 1015 | 2 | < 0.1% |
| 1.508211326 × 1015 | 2 | < 0.1% |
| 1.508211326 × 1015 | 4 | |
| 1.508211324 × 1015 | 2 | < 0.1% |
| Distinct | 646874 |
|---|---|
| Distinct (%) | 21.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.8 MiB |
| 2017-10-09 15:40:57 | 127 |
|---|---|
| 2017-10-13 12:09:33 | 112 |
| 2017-10-04 16:12:47 | 108 |
| 2017-10-06 17:09:33 | 98 |
| 2017-10-10 14:56:06 | 97 |
| Other values (646869) |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
Characters and Unicode
| Total characters | 56775439 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 4 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2017-10-01 02:37:03 |
|---|---|
| 2nd row | 2017-10-01 02:37:03 |
| 3rd row | 2017-10-01 02:37:06 |
| 4th row | 2017-10-01 02:37:06 |
| 5th row | 2017-10-01 02:37:15 |
Common Values
| Value | Count | Frequency (%) |
| 2017-10-09 15:40:57 | 127 | < 0.1% |
| 2017-10-13 12:09:33 | 112 | < 0.1% |
| 2017-10-04 16:12:47 | 108 | < 0.1% |
| 2017-10-06 17:09:33 | 98 | < 0.1% |
| 2017-10-10 14:56:06 | 97 | < 0.1% |
| 2017-10-16 00:05:31 | 96 | < 0.1% |
| 2017-10-02 15:51:39 | 87 | < 0.1% |
| 2017-10-10 16:05:43 | 87 | < 0.1% |
| 2017-10-08 15:10:03 | 86 | < 0.1% |
| 2017-10-16 11:52:17 | 85 | < 0.1% |
| Other values (646864) | 2987198 |
Length
| Value | Count | Frequency (%) |
| 2017-10-02 | 305709 | 5.1% |
| 2017-10-10 | 281384 | 4.7% |
| 2017-10-03 | 259709 | 4.3% |
| 2017-10-09 | 249856 | 4.2% |
| 2017-10-11 | 238521 | 4.0% |
| 2017-10-04 | 215267 | 3.6% |
| 2017-10-06 | 207537 | 3.5% |
| 2017-10-16 | 190891 | 3.2% |
| 2017-10-05 | 190074 | 3.2% |
| 2017-10-13 | 180599 | 3.0% |
| Other values (83818) | 3656815 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 11434946 | |
| 0 | 10649945 | |
| 2 | 5982278 | |
| - | 5976362 | |
| : | 5976362 | |
| 7 | 3949992 | 7.0% |
| 2988181 | 5.3% | |
| 3 | 2391815 | 4.2% |
| 4 | 2104829 | 3.7% |
| 5 | 2074593 | 3.7% |
| Other values (3) | 3246136 | 5.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 41834534 | |
| Dash Punctuation | 5976362 | 10.5% |
| Other Punctuation | 5976362 | 10.5% |
| Space Separator | 2988181 | 5.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 11434946 | |
| 0 | 10649945 | |
| 2 | 5982278 | |
| 7 | 3949992 | 9.4% |
| 3 | 2391815 | 5.7% |
| 4 | 2104829 | 5.0% |
| 5 | 2074593 | 5.0% |
| 6 | 1210846 | 2.9% |
| 9 | 1111614 | 2.7% |
| 8 | 923676 | 2.2% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 5976362 |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 5976362 |
Space Separator
| Value | Count | Frequency (%) |
| 2988181 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 56775439 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 11434946 | |
| 0 | 10649945 | |
| 2 | 5982278 | |
| - | 5976362 | |
| : | 5976362 | |
| 7 | 3949992 | 7.0% |
| 2988181 | 5.3% | |
| 3 | 2391815 | 4.2% |
| 4 | 2104829 | 3.7% |
| 5 | 2074593 | 3.7% |
| Other values (3) | 3246136 | 5.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 56775439 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 11434946 | |
| 0 | 10649945 | |
| 2 | 5982278 | |
| - | 5976362 | |
| : | 5976362 | |
| 7 | 3949992 | 7.0% |
| 2988181 | 5.3% | |
| 3 | 2391815 | 4.2% |
| 4 | 2104829 | 3.7% |
| 5 | 2074593 | 3.7% |
| Other values (3) | 3246136 | 5.7% |
session_size
Real number (ℝ≥0)
| Distinct | 72 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.901885127 |
| Minimum | 2 |
|---|---|
| Maximum | 124 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 2 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 9 |
| Maximum | 124 |
| Range | 122 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 3.929941495 |
|---|---|
| Coefficient of variation (CV) | 1.007190465 |
| Kurtosis | 158.4608899 |
| Mean | 3.901885127 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 9.090074854 |
| Sum | 11659539 |
| Variance | 15.44444016 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 1260372 | |
| 3 | 670185 | |
| 4 | 374240 | 12.5% |
| 5 | 220105 | 7.4% |
| 6 | 135762 | 4.5% |
| 7 | 88354 | 3.0% |
| 8 | 58544 | 2.0% |
| 9 | 40878 | 1.4% |
| 10 | 29530 | 1.0% |
| 11 | 21714 | 0.7% |
| Other values (62) | 88497 | 3.0% |
| Value | Count | Frequency (%) |
| 2 | 1260372 | |
| 3 | 670185 | |
| 4 | 374240 | 12.5% |
| 5 | 220105 | 7.4% |
| 6 | 135762 | 4.5% |
| 7 | 88354 | 3.0% |
| 8 | 58544 | 2.0% |
| 9 | 40878 | 1.4% |
| 10 | 29530 | 1.0% |
| 11 | 21714 | 0.7% |
| Value | Count | Frequency (%) |
| 124 | 124 | |
| 107 | 107 | |
| 106 | 106 | |
| 98 | 98 | |
| 94 | 94 | |
| 92 | 92 | |
| 86 | 86 | |
| 82 | 82 | |
| 79 | 79 | |
| 75 | 75 |
click_article_id
Real number (ℝ≥0)
| Distinct | 46033 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 194922.6487 |
| Minimum | 3 |
|---|---|
| Maximum | 364046 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 42223 |
| Q1 | 124228 |
| median | 202381 |
| Q3 | 277067 |
| 95-th percentile | 336254 |
| Maximum | 364046 |
| Range | 364043 |
| Interquartile range (IQR) | 152839 |
Descriptive statistics
| Standard deviation | 90768.42147 |
|---|---|
| Coefficient of variation (CV) | 0.4656638009 |
| Kurtosis | -0.943045904 |
| Mean | 194922.6487 |
| Median Absolute Deviation (MAD) | 77632 |
| Skewness | -0.1234365434 |
| Sum | 5.824641553 × 1011 |
| Variance | 8238906336 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 160974 | 37213 | 1.2% |
| 272143 | 28943 | 1.0% |
| 336221 | 23851 | 0.8% |
| 234698 | 23499 | 0.8% |
| 123909 | 23122 | 0.8% |
| 336223 | 21855 | 0.7% |
| 96210 | 21577 | 0.7% |
| 162655 | 21062 | 0.7% |
| 183176 | 20303 | 0.7% |
| 168623 | 19526 | 0.7% |
| Other values (46023) | 2747230 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 27 | 1 | |
| 69 | 1 | |
| 81 | 2 | |
| 84 | 1 | |
| 94 | 2 | |
| 115 | 2 | |
| 125 | 1 | |
| 137 | 1 | |
| 139 | 1 |
| Value | Count | Frequency (%) |
| 364046 | 2 | < 0.1% |
| 364043 | 8 | < 0.1% |
| 364028 | 1 | < 0.1% |
| 364022 | 1 | < 0.1% |
| 364017 | 22 | |
| 364015 | 1 | < 0.1% |
| 364014 | 1 | < 0.1% |
| 364013 | 1 | < 0.1% |
| 364012 | 1 | < 0.1% |
| 364001 | 4 | < 0.1% |
| Distinct | 2983198 |
|---|---|
| Distinct (%) | 99.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.8 MiB |
| 2017-10-02 16:16:49.961 | 3 |
|---|---|
| 2017-10-06 20:07:23.928 | 3 |
| 2017-10-13 14:39:48.690 | 3 |
| 2017-10-16 14:42:54.899 | 3 |
| 2017-10-14 12:28:25.656 | 3 |
| Other values (2983193) |
Length
| Max length | 23 |
|---|---|
| Median length | 23 |
| Mean length | 23 |
| Min length | 23 |
Characters and Unicode
| Total characters | 68728163 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 4 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 2978224 ? |
|---|---|
| Unique (%) | 99.7% |
Sample
| 1st row | 2017-10-01 03:00:28.020 |
|---|---|
| 2nd row | 2017-10-01 03:00:58.020 |
| 3rd row | 2017-10-01 03:03:37.951 |
| 4th row | 2017-10-01 03:04:07.951 |
| 5th row | 2017-10-01 03:04:50.575 |
Common Values
| Value | Count | Frequency (%) |
| 2017-10-02 16:16:49.961 | 3 | < 0.1% |
| 2017-10-06 20:07:23.928 | 3 | < 0.1% |
| 2017-10-13 14:39:48.690 | 3 | < 0.1% |
| 2017-10-16 14:42:54.899 | 3 | < 0.1% |
| 2017-10-14 12:28:25.656 | 3 | < 0.1% |
| 2017-10-03 17:40:48.643 | 3 | < 0.1% |
| 2017-10-02 20:16:02.256 | 3 | < 0.1% |
| 2017-10-09 13:01:34.045 | 3 | < 0.1% |
| 2017-10-02 14:54:37.261 | 3 | < 0.1% |
| 2017-10-15 21:06:30.958 | 2 | < 0.1% |
| Other values (2983188) | 2988152 |
Length
| Value | Count | Frequency (%) |
| 2017-10-02 | 303177 | 5.1% |
| 2017-10-10 | 282391 | 4.7% |
| 2017-10-03 | 261159 | 4.4% |
| 2017-10-09 | 248208 | 4.2% |
| 2017-10-11 | 238969 | 4.0% |
| 2017-10-04 | 215415 | 3.6% |
| 2017-10-06 | 207646 | 3.5% |
| 2017-10-05 | 190003 | 3.2% |
| 2017-10-16 | 189779 | 3.2% |
| 2017-10-13 | 180723 | 3.0% |
| Other values (2923727) | 3658892 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 12265879 | |
| 0 | 11533489 | |
| 2 | 6914494 | |
| - | 5976362 | |
| : | 5976362 | |
| 7 | 4849332 | 7.1% |
| 3 | 3299611 | 4.8% |
| 4 | 3017099 | 4.4% |
| 2988181 | 4.3% | |
| . | 2988181 | 4.3% |
| Other values (4) | 8919173 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 50799077 | |
| Other Punctuation | 8964543 | 13.0% |
| Dash Punctuation | 5976362 | 8.7% |
| Space Separator | 2988181 | 4.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 12265879 | |
| 0 | 11533489 | |
| 2 | 6914494 | |
| 7 | 4849332 | 9.5% |
| 3 | 3299611 | 6.5% |
| 4 | 3017099 | 5.9% |
| 5 | 2978574 | 5.9% |
| 6 | 2106971 | 4.1% |
| 9 | 2008543 | 4.0% |
| 8 | 1825085 | 3.6% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 5976362 | |
| . | 2988181 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 5976362 |
Space Separator
| Value | Count | Frequency (%) |
| 2988181 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 68728163 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 12265879 | |
| 0 | 11533489 | |
| 2 | 6914494 | |
| - | 5976362 | |
| : | 5976362 | |
| 7 | 4849332 | 7.1% |
| 3 | 3299611 | 4.8% |
| 4 | 3017099 | 4.4% |
| 2988181 | 4.3% | |
| . | 2988181 | 4.3% |
| Other values (4) | 8919173 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 68728163 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 12265879 | |
| 0 | 11533489 | |
| 2 | 6914494 | |
| - | 5976362 | |
| : | 5976362 | |
| 7 | 4849332 | 7.1% |
| 3 | 3299611 | 4.8% |
| 4 | 3017099 | 4.4% |
| 2988181 | 4.3% | |
| . | 2988181 | 4.3% |
| Other values (4) | 8919173 |
click_environment
Real number (ℝ≥0)
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.942652068 |
| Minimum | 1 |
|---|---|
| Maximum | 4 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 4 |
| median | 4 |
| Q3 | 4 |
| 95-th percentile | 4 |
| Maximum | 4 |
| Range | 3 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.339680408 |
|---|---|
| Coefficient of variation (CV) | 0.0861553092 |
| Kurtosis | 33.01323632 |
| Mean | 3.942652068 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -5.848728196 |
| Sum | 11781358 |
| Variance | 0.1153827796 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4 | 2904478 | |
| 2 | 79743 | 2.7% |
| 1 | 3960 | 0.1% |
| Value | Count | Frequency (%) |
| 1 | 3960 | 0.1% |
| 2 | 79743 | 2.7% |
| 4 | 2904478 |
| Value | Count | Frequency (%) |
| 4 | 2904478 | |
| 2 | 79743 | 2.7% |
| 1 | 3960 | 0.1% |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.819305792 |
| Minimum | 1 |
|---|---|
| Maximum | 5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 3 |
| 95-th percentile | 3 |
| Maximum | 5 |
| Range | 4 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.042213782 |
|---|---|
| Coefficient of variation (CV) | 0.5728634442 |
| Kurtosis | -1.427040365 |
| Mean | 1.819305792 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.5763858618 |
| Sum | 5436415 |
| Variance | 1.086209567 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 1823162 | |
| 3 | 1047086 | |
| 4 | 117640 | 3.9% |
| 5 | 283 | < 0.1% |
| 2 | 10 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 1823162 | |
| 2 | 10 | < 0.1% |
| 3 | 1047086 | |
| 4 | 117640 | 3.9% |
| 5 | 283 | < 0.1% |
| Value | Count | Frequency (%) |
| 5 | 283 | < 0.1% |
| 4 | 117640 | 3.9% |
| 3 | 1047086 | |
| 2 | 10 | < 0.1% |
| 1 | 1823162 |
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.27760333 |
| Minimum | 2 |
|---|---|
| Maximum | 20 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 2 |
| median | 17 |
| Q3 | 17 |
| 95-th percentile | 20 |
| Maximum | 20 |
| Range | 18 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 6.881718417 |
|---|---|
| Coefficient of variation (CV) | 0.5182952258 |
| Kurtosis | -0.9317514661 |
| Mean | 13.27760333 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -0.9541171292 |
| Sum | 39675882 |
| Variance | 47.35804837 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 17 | 1738138 | |
| 2 | 788699 | |
| 20 | 369586 | 12.4% |
| 12 | 60096 | 2.0% |
| 13 | 23711 | 0.8% |
| 19 | 6384 | 0.2% |
| 5 | 1513 | 0.1% |
| 3 | 54 | < 0.1% |
| Value | Count | Frequency (%) |
| 2 | 788699 | |
| 3 | 54 | < 0.1% |
| 5 | 1513 | 0.1% |
| 12 | 60096 | 2.0% |
| 13 | 23711 | 0.8% |
| 17 | 1738138 | |
| 19 | 6384 | 0.2% |
| 20 | 369586 | 12.4% |
| Value | Count | Frequency (%) |
| 20 | 369586 | 12.4% |
| 19 | 6384 | 0.2% |
| 17 | 1738138 | |
| 13 | 23711 | 0.8% |
| 12 | 60096 | 2.0% |
| 5 | 1513 | 0.1% |
| 3 | 54 | < 0.1% |
| 2 | 788699 |
| Distinct | 11 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.357656046 |
| Minimum | 1 |
|---|---|
| Maximum | 11 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 11 |
| Range | 10 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.725860976 |
|---|---|
| Coefficient of variation (CV) | 1.271206342 |
| Kurtosis | 21.55275991 |
| Mean | 1.357656046 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.802252338 |
| Sum | 4056922 |
| Variance | 2.978596109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2852406 | |
| 10 | 61377 | 2.1% |
| 11 | 29999 | 1.0% |
| 8 | 9556 | 0.3% |
| 6 | 7256 | 0.2% |
| 9 | 6746 | 0.2% |
| 2 | 6101 | 0.2% |
| 3 | 4540 | 0.2% |
| 5 | 3498 | 0.1% |
| 4 | 3389 | 0.1% |
| Value | Count | Frequency (%) |
| 1 | 2852406 | |
| 2 | 6101 | 0.2% |
| 3 | 4540 | 0.2% |
| 4 | 3389 | 0.1% |
| 5 | 3498 | 0.1% |
| 6 | 7256 | 0.2% |
| 7 | 3313 | 0.1% |
| 8 | 9556 | 0.3% |
| 9 | 6746 | 0.2% |
| 10 | 61377 | 2.1% |
| Value | Count | Frequency (%) |
| 11 | 29999 | |
| 10 | 61377 | |
| 9 | 6746 | 0.2% |
| 8 | 9556 | 0.3% |
| 7 | 3313 | 0.1% |
| 6 | 7256 | 0.2% |
| 5 | 3498 | 0.1% |
| 4 | 3389 | 0.1% |
| 3 | 4540 | 0.2% |
| 2 | 6101 | 0.2% |
| Distinct | 28 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.31331435 |
| Minimum | 1 |
|---|---|
| Maximum | 28 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 13 |
| median | 21 |
| Q3 | 25 |
| 95-th percentile | 27 |
| Maximum | 28 |
| Range | 27 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 7.064006436 |
|---|---|
| Coefficient of variation (CV) | 0.3857306383 |
| Kurtosis | -0.9755078164 |
| Mean | 18.31331435 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -0.545880017 |
| Sum | 54723498 |
| Variance | 49.90018693 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 25 | 804985 | |
| 21 | 464230 | |
| 13 | 320957 | 10.7% |
| 8 | 179339 | 6.0% |
| 16 | 164884 | 5.5% |
| 28 | 135793 | 4.5% |
| 24 | 130537 | 4.4% |
| 20 | 120884 | 4.0% |
| 5 | 96979 | 3.2% |
| 9 | 84693 | 2.8% |
| Other values (18) | 484900 |
| Value | Count | Frequency (%) |
| 1 | 7110 | 0.2% |
| 2 | 16728 | 0.6% |
| 3 | 3997 | 0.1% |
| 4 | 30265 | 1.0% |
| 5 | 96979 | |
| 6 | 57254 | 1.9% |
| 7 | 64062 | 2.1% |
| 8 | 179339 | |
| 9 | 84693 | |
| 10 | 21995 | 0.7% |
| Value | Count | Frequency (%) |
| 28 | 135793 | 4.5% |
| 27 | 18711 | 0.6% |
| 26 | 18893 | 0.6% |
| 25 | 804985 | |
| 24 | 130537 | 4.4% |
| 23 | 43 | < 0.1% |
| 22 | 13101 | 0.4% |
| 21 | 464230 | |
| 20 | 120884 | 4.0% |
| 19 | 34092 | 1.1% |
click_referrer_type
Real number (ℝ≥0)
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.838981307 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.15635571 |
|---|---|
| Coefficient of variation (CV) | 0.628802319 |
| Kurtosis | 9.117533472 |
| Mean | 1.838981307 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.83996653 |
| Sum | 5495209 |
| Variance | 1.337158529 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 1602601 | |
| 1 | 1194321 | |
| 5 | 80766 | 2.7% |
| 7 | 69798 | 2.3% |
| 6 | 20455 | 0.7% |
| 4 | 19820 | 0.7% |
| 3 | 420 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 1194321 | |
| 2 | 1602601 | |
| 3 | 420 | < 0.1% |
| 4 | 19820 | 0.7% |
| 5 | 80766 | 2.7% |
| 6 | 20455 | 0.7% |
| 7 | 69798 | 2.3% |
| Value | Count | Frequency (%) |
| 7 | 69798 | 2.3% |
| 6 | 20455 | 0.7% |
| 5 | 80766 | 2.7% |
| 4 | 19820 | 0.7% |
| 3 | 420 | < 0.1% |
| 2 | 1602601 | |
| 1 | 1194321 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| Unnamed: 0 | user_id | session_id | session_start | session_size | click_article_id | click_timestamp | click_environment | click_deviceGroup | click_os | click_country | click_region | click_referrer_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 1506825423271737 | 2017-10-01 02:37:03 | 2 | 157541 | 2017-10-01 03:00:28.020 | 4 | 3 | 20 | 1 | 20 | 2 |
| 1 | 1 | 0 | 1506825423271737 | 2017-10-01 02:37:03 | 2 | 68866 | 2017-10-01 03:00:58.020 | 4 | 3 | 20 | 1 | 20 | 2 |
| 2 | 2 | 1 | 1506825426267738 | 2017-10-01 02:37:06 | 2 | 235840 | 2017-10-01 03:03:37.951 | 4 | 1 | 17 | 1 | 16 | 2 |
| 3 | 3 | 1 | 1506825426267738 | 2017-10-01 02:37:06 | 2 | 96663 | 2017-10-01 03:04:07.951 | 4 | 1 | 17 | 1 | 16 | 2 |
| 4 | 4 | 2 | 1506825435299739 | 2017-10-01 02:37:15 | 2 | 119592 | 2017-10-01 03:04:50.575 | 4 | 1 | 17 | 1 | 24 | 2 |
| 5 | 5 | 2 | 1506825435299739 | 2017-10-01 02:37:15 | 2 | 30970 | 2017-10-01 03:05:20.575 | 4 | 1 | 17 | 1 | 24 | 2 |
| 6 | 6 | 3 | 1506825442704740 | 2017-10-01 02:37:22 | 2 | 236065 | 2017-10-01 03:12:16.942 | 4 | 3 | 2 | 1 | 21 | 1 |
| 7 | 7 | 3 | 1506825442704740 | 2017-10-01 02:37:22 | 2 | 236294 | 2017-10-01 03:12:46.942 | 4 | 3 | 2 | 1 | 21 | 1 |
| 8 | 8 | 4 | 1506825528135741 | 2017-10-01 02:38:48 | 2 | 48915 | 2017-10-01 03:02:07.593 | 4 | 1 | 17 | 1 | 17 | 1 |
| 9 | 9 | 4 | 1506825528135741 | 2017-10-01 02:38:48 | 2 | 44488 | 2017-10-01 03:02:37.593 | 4 | 1 | 17 | 1 | 17 | 1 |
Last rows
| Unnamed: 0 | user_id | session_id | session_start | session_size | click_article_id | click_timestamp | click_environment | click_deviceGroup | click_os | click_country | click_region | click_referrer_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2988171 | 2988171 | 34979 | 1508211369104327 | 2017-10-17 03:36:09 | 7 | 211732 | 2017-10-17 03:40:36.300 | 4 | 3 | 2 | 1 | 25 | 1 |
| 2988172 | 2988172 | 34979 | 1508211369104327 | 2017-10-17 03:36:09 | 7 | 16346 | 2017-10-17 03:43:17.187 | 4 | 3 | 2 | 1 | 25 | 1 |
| 2988173 | 2988173 | 34979 | 1508211369104327 | 2017-10-17 03:36:09 | 7 | 331149 | 2017-10-17 03:45:44.116 | 4 | 3 | 2 | 1 | 25 | 1 |
| 2988174 | 2988174 | 34979 | 1508211369104327 | 2017-10-17 03:36:09 | 7 | 157478 | 2017-10-17 03:46:14.116 | 4 | 3 | 2 | 1 | 25 | 1 |
| 2988175 | 2988175 | 10051 | 1508211372158328 | 2017-10-17 03:36:12 | 2 | 211442 | 2017-10-17 03:38:47.302 | 4 | 3 | 2 | 1 | 25 | 1 |
| 2988176 | 2988176 | 10051 | 1508211372158328 | 2017-10-17 03:36:12 | 2 | 84911 | 2017-10-17 03:39:17.302 | 4 | 3 | 2 | 1 | 25 | 1 |
| 2988177 | 2988177 | 322896 | 1508211376302329 | 2017-10-17 03:36:16 | 2 | 30760 | 2017-10-17 03:41:12.520 | 4 | 1 | 17 | 1 | 25 | 2 |
| 2988178 | 2988178 | 322896 | 1508211376302329 | 2017-10-17 03:36:16 | 2 | 157507 | 2017-10-17 03:41:42.520 | 4 | 1 | 17 | 1 | 25 | 2 |
| 2988179 | 2988179 | 123718 | 1508211379189330 | 2017-10-17 03:36:19 | 2 | 234481 | 2017-10-17 03:38:33.583 | 4 | 3 | 2 | 1 | 25 | 2 |
| 2988180 | 2988180 | 123718 | 1508211379189330 | 2017-10-17 03:36:19 | 2 | 233578 | 2017-10-17 03:39:03.583 | 4 | 3 | 2 | 1 | 25 | 2 |